Research on Scheduling Scheme for Hadoop Clusters
نویسندگان
چکیده
In this paper, we import a prefetching mechanism into MapReduce model while retaining compatibility with the native Hadoop. Given a dataintensive application running on a Hadoop cluster, our approach estimates the execution time of each task and adaptively preloads an amount of data to the memory before the new task is assigned to the computing node.
منابع مشابه
An Optimal Task Assignment Policy and Performance Diagnosis Strategy for Heterogeneous Hadoop Cluster
The goal of the proposed research is to improve the performance of Hadoop-based software running on a heterogeneous cluster. My approach lies in the intersection of machine learning, scheduling and diagnosis. We mainly focus on heterogeneous Hadoop clusters and try to improve the performance by implementing a more efficient scheduler for this class of cluster.
متن کاملThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters
Hadoop is the de-facto standard for big data analytics applications. Presently available schedulers for Hadoop clusters assign tasks to nodes without regard to the capability of the nodes. We propose ThroughputScheduler, which reduces the overall job completion time on a clusters of heterogeneous nodes by actively scheduling tasks on nodes based on optimally matching job requirements to node ca...
متن کاملMaximizing Data Locality in Hadoop Clusters via Controlled Reduce Task Scheduling
The overall goal of this project is to gain a hands-on experience with working on a large open-ended research-oriented project using the Hadoop framework. Hadoop is an open source implementation of MapReduce and Google File System, and is currently enjoying wide popularity. Students will modify the task scheduler of Hadoop, conduct several experimental studies, and analyze performance and netwo...
متن کاملSome Workload Scheduling Alternatives in a High Performance Computing Environment
Clusters of commodity microprocessors have overtaken custom-designed systems as the high performance computing (HPC) platform of choice. The design and optimization of workload scheduling systems for clusters has been an active research area. This paper surveys some examples of workload scheduling methods used in large-scale applications such as Google, Yahoo, and Amazon that use a MapReduce pa...
متن کاملAn adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems
The MapReduce and Hadoop frameworks were designed to support efficient large scale computations. There has been growing interest in employing Hadoop clusters for various diverse applications. A large number of (heterogeneous) clients, using the same Hadoop cluster, can result in tensions between the various performance metrics by which such systems are measured. On the one hand, from the servic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013